##  [1] "Census Tract"                "Total Population"           
##  [3] "California County"           "ZIP"                        
##  [5] "Approximate Location"        "Longitude"                  
##  [7] "Latitude"                    "CES 4.0 Score"              
##  [9] "CES 4.0 Percentile"          "CES 4.0 Percentile Range"   
## [11] "Ozone"                       "Ozone Pctl"                 
## [13] "PM2.5"                       "PM2.5 Pctl"                 
## [15] "Diesel PM"                   "Diesel PM Pctl"             
## [17] "Drinking Water"              "Drinking Water Pctl"        
## [19] "Lead"                        "Lead Pctl"                  
## [21] "Pesticides"                  "Pesticides Pctl"            
## [23] "Tox. Release"                "Tox. Release Pctl"          
## [25] "Traffic"                     "Traffic Pctl"               
## [27] "Cleanup Sites"               "Cleanup Sites Pctl"         
## [29] "Groundwater Threats"         "Groundwater Threats Pctl"   
## [31] "Haz. Waste"                  "Haz. Waste Pctl"            
## [33] "Imp. Water Bodies"           "Imp. Water Bodies Pctl"     
## [35] "Solid Waste"                 "Solid Waste Pctl"           
## [37] "Pollution Burden"            "Pollution Burden Score"     
## [39] "Pollution Burden Pctl"       "Asthma"                     
## [41] "Asthma Pctl"                 "Low Birth Weight"           
## [43] "Low Birth Weight Pctl"       "Cardiovascular Disease"     
## [45] "Cardiovascular Disease Pctl" "Education"                  
## [47] "Education Pctl"              "Linguistic Isolation"       
## [49] "Linguistic Isolation Pctl"   "Poverty"                    
## [51] "Poverty Pctl"                "Unemployment"               
## [53] "Unemployment Pctl"           "Housing Burden"             
## [55] "Housing Burden Pctl"         "Pop. Char."                 
## [57] "Pop. Char. Score"            "Pop. Char. Pctl"
## [1] 8
## [1] 8

#Based on data averaged over 2015-2017, Western Alameda County, Contra Costa County, and Solano County have the highest rates of Emergency Department (ED) visits for Asthma per 10,000 within a population. Based on this, CalEnviroScreen assumes that these counties have the highest concentration of people with Asthma. It should be noted however that the data is likely skewed because it only measures people that have made an ED visit between 2015 and 2017. And in reality the rates of people with Asthma is probably be much higher.

#The annual mean concentration (from 2015 to 2017) of ‘Particulate matter pollution, and fine particle pollution’ (PM2.5) was measured using weighted averages of measured monitor concentrations (measured by monitering sights in California) and satellite observations “derived from Aerosol Optical Depth (AOD) measurements,land use and meteorology data via regression on ground level monitor data.” CalEnviroScreen 4.0 The data was also measured in units of µg/m^3. The areas that have the highest annual mean concentration of PM2.5 between 2015 and 2017 are Napa County, Oakland, and San Jose.

#The fitness of the best-fit line appears to not be very fit. Off a first glance the sum of squares of residuals looks like it would be pretty high –> There is a lot of data located far from the best-fit line.

## 
## Call:
## lm(formula = Asthma ~ PM2.5, data = ces4_clean)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -54.47 -25.89  -9.61  12.94 182.95 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -116.278     13.040  -8.917   <2e-16 ***
## PM2.5         19.862      1.534  12.950   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 37.49 on 1578 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.09606,    Adjusted R-squared:  0.09549 
## F-statistic: 167.7 on 1 and 1578 DF,  p-value: < 2.2e-16

#“An increase of 1 unit of PM2.5 is associated with an increase of Asthma by 19.86; “9.6% of the variation in x is explained by the variation in y”.Based on these results I think we can reject the null hypothesis. The p-value is well under .05 at 2.2e-16 and thus there is statistical evidence of a correlation between the two variables.

#The mean of the residual should be close to zero and distribution should display itself in a bell-shape. The density curve however is skewed for the residuals.

## 
## Call:
## lm(formula = log(Asthma) ~ PM2.5, data = ces4_clean)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.00402 -0.46479  0.03313  0.42298  1.75525 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.69234    0.22840   3.031  0.00248 ** 
## PM2.5        0.35633    0.02686  13.264  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6566 on 1578 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.1003, Adjusted R-squared:  0.09974 
## F-statistic: 175.9 on 1 and 1578 DF,  p-value: < 2.2e-16

#“An increase of 1 unit of PM2.5 is associated with an increase of Asthma by 1.4; “10% of the variation in x is explained by the variation in y”

#Now the mean of the residual is closer to zero. The data still appears to be skewed because are two peaks and the right tale is a bit longer than the left.

#The two areas with the most negative residual values are in Stanford, Santa Clara County. A negative residual implies an over-estimation (residual = observed value - predicted value) and this census track may have a negative residual due to inconsistent population data. A large portion of the population could be made up of students who generally enter and leave the area in bulk. Over the two year period that the data was collected, a large portion of the population may have changed resulting in skewed data.